Dataset | Year | Programming Language | Data Source | Download Link |
---|---|---|---|---|
BigCloneBench | 2014 | Java | GitHub | Download |
OJ dataset | 2016 | C++ | OJ Platform | Download |
CodeSearchNet | 2019 | Go Java JavaScript PHP Python Ruby |
GitHub | Download |
Code2Seq | 2019 | Java | GitHub | Download |
Devign | 2019 | Java | GitHub | Download |
Google Code Jam (GCJ) | 2020 | C++ Java |
OJ Platform | Download |
CodeXGLUE | 2021 | Go Java JavaScript PHP Python Ruby |
GitHub | Download |
CodeQA | 2021 | Java Python |
GitHub | Download |
APPS | 2021 | Python | OJ Platform | Download |
Shellcode_IA32 | 2021 | assembly language instruction | OJ Platform | Download |
SecurityEval | 2022 | Python | GitHub | Download |
LLMSecEval | 2023 | Python C |
GitHub | Download |
PoisonPy | 2023 | Python | GitHub | not yet published |
Attack Technique | Year | Venue | Attack Type | Target Models | Target Tasks |
---|---|---|---|---|---|
Remakrishnan et al. | 2020 | arXiv | Data poisoning | Code2Seq Seq2Seq |
Code summarization Method name prediction |
Schuster et al. | 2021 | USENIX Security | Data poisoning Model poisoning |
Pythia GPT-2 |
Code completion |
Severi et al. | 2021 | USENIX Security | Data poisoning | LightGBM EmberNN Random Forest Linear SVM |
Malware classification |
CodePoisoner | 2022 | arXiv | Data poisoning | LSTM TextCNN Transformer CodeBERT |
Code defect detection Code clone detection Code repair |
Wan et al. | 2022 | ESEC/FSE | Data poisoning | BiRNN Transformer CodeBERT |
Code search |
BADCODE | 2023 | ACL | Data poisoning | CodeBERT CodeT5 |
Code search |
Cotroneo et al. | 2023 | arXiv | Data poisoning | Seq2Seq CodeBERT CodeT5+ |
Code generation |
AFRAIDOOR | 2023 | arXiv | Data poisoning | CodeBERT CodeT5 PLBART |
Code summarization |
PELICAN | 2023 | USENIX Security | Data poisoning | BiRNN-func XDA-func XDA-cell StateFormer EKLAVYA EKLAVYA++ in-nomine in-nomine++ S2V, S2V++ Trex SAFE, SAFE++ S2V-B, S2V-B++ |
Binary code analysis |
Li et al. | 2023 | ACL | Model poisoning | PLBART CodeT5 |
Code defect detection Code clone prediction Code2Code translation Text2Code translation Code refine |
BadCS | 2023 | arXiv | Model Poisoning | BiRNN Transformer CodeBERT GraphCodeBERT |
Code Search |